action 2
Appendix Contents
Every moral scenario consists of a triple ( context, action 1, action 2) and a set of auxiliary labels. The actions describe two possible actions in the first-person (e.g., The moral scenarios can be categorized into: 1. MoralChoice-LowAmbiguity The LLM-assisted construction (i.e., zero-and few-shot prompting setups) of the scenarios is grounded Category Rule Refined Rule Description Do not harm Do not kill Do not kill (i.e., do not cause permanent loss of consciousness). Do not cause pain Do not cause physical or emotional pain or unpleasant feelings (e.g., anger, sadness) to someone. Do not disable Do not deprive someone of their physical, mental or volitional ability (e.g. Do not deprive of freedom Do not deprive someone of their freedom (i.e., make a person unable to do something by altering the person's environment or situation).
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (4 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
Appendix Contents
Every moral scenario consists of a triple ( context, action 1, action 2) and a set of auxiliary labels. The actions describe two possible actions in the first-person (e.g., The moral scenarios can be categorized into: 1. MoralChoice-LowAmbiguity The LLM-assisted construction (i.e., zero-and few-shot prompting setups) of the scenarios is grounded Category Rule Refined Rule Description Do not harm Do not kill Do not kill (i.e., do not cause permanent loss of consciousness). Do not cause pain Do not cause physical or emotional pain or unpleasant feelings (e.g., anger, sadness) to someone. Do not disable Do not deprive someone of their physical, mental or volitional ability (e.g. Do not deprive of freedom Do not deprive someone of their freedom (i.e., make a person unable to do something by altering the person's environment or situation).
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (4 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
Deliberate Planning in Language Models with Symbolic Representation
Xiong, Siheng, Liu, Zhangding, Zhou, Jieyu, Su, Yusen
Planning remains a core challenge for large language models (LLMs), particularly in domains that require coherent multi-step action sequences grounded in external constraints. We introduce SymPlanner, a novel framework that equips LLMs with structured planning capabilities by interfacing them with a symbolic environment that serves as an explicit world model. Rather than relying purely on natural language reasoning, SymPlanner grounds the planning process in a symbolic state space, where a policy model proposes actions and a symbolic environment deterministically executes and verifies their effects. To enhance exploration and improve robustness, we introduce Iterative Correction (IC), which refines previously proposed actions by leveraging feedback from the symbolic environment to eliminate invalid decisions and guide the model toward valid alternatives. Additionally, Contrastive Ranking (CR) enables fine-grained comparison of candidate plans by evaluating them jointly. Conceptually, SymPlanner operationalizes two cognitive faculties: (i) error monitoring and repair via externalized feedback (IC) and (ii) preference formation among alternatives via pairwise comparison (CR), advancing cognitively plausible, symbol-grounded planning aligned with the rich structure in intelligent systems. We evaluate SymPlanner on PlanBench, demonstrating that it produces more coherent, diverse, and verifiable plans than pure natural language baselines.
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering
Zhou, Wei, Mesgar, Mohsen, Friedrich, Annemarie, Adel, Heike
Complex table question answering (TQA) aims to answer questions that require complex reasoning, such as multi-step or multi-category reasoning, over data represented in tabular form. Previous approaches demonstrated notable performance by leveraging either closed-source large language models (LLMs) or fine-tuned open-weight LLMs. However, fine-tuning LLMs requires high-quality training data, which is costly to obtain, and utilizing closed-source LLMs poses accessibility challenges and leads to reproducibility issues. In this paper, we propose Multi-Agent Collaboration with Tool use (MACT), a framework that requires neither closed-source models nor fine-tuning. In MACT, a planning agent and a coding agent that also make use of tools collaborate to answer questions. Our experiments on four TQA benchmarks show that MACT outperforms previous SoTA systems on three out of four benchmarks and that it performs comparably to the larger and more expensive closed-source model GPT-4 on two benchmarks, even when using only open-weight models without any fine-tuning. We conduct extensive analyses to prove the effectiveness of MACT's multi-agent collaboration in TQA.
- North America > Canada > Saskatchewan > Saskatoon (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- (26 more...)
- Research Report (1.00)
- Financial News (0.68)
- Transportation > Passenger (1.00)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Transportation > Air (0.93)
- Consumer Products & Services > Travel (0.93)
Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization
Zhao, Yunfan, Behari, Nikhil, Hughes, Edward, Zhang, Edwin, Nagaraj, Dheeraj, Tuyls, Karl, Taneja, Aparna, Tambe, Milind
Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast generalization, we learn a novel single policy network model that utilizes feature information and employs a training procedure in which arms opt-in and out over time. We derive a new update rule for a crucial $\lambda$-network with theoretical convergence guarantees and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems.
- Oceania > New Zealand (0.04)
- North America > United States > Texas > Schleicher County (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area (0.68)
- Health & Medicine > Public Health (0.67)